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ABSTRACT 

We study the relation between PageRank and other parame- 
ters of information networks such as in-degree, out-degree, 
and the fraction of dangling nodes. We model this rela- 
tion through a stochastic equation inspired by the original 
definition of PageRank. Further, we use the theory of regu- 
lar variation to prove that PageRank and in-degree follow 
power laws with the same exponent. The difference between 
these two power laws is in a multiple coefficient, which de- 
pends mainly on the fraction of dangling nodes, average in- 
degree, the power law exponent, and damping factor. The 
out-degree distribution has a minor effect, which we exp- 
licitly quantify. Our theoretical predictions show a good 
agreement with experimental data on three different sam- 
ples of the Web. 
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1. INTRODUCTION 

Originally created for Web ranking, PageRank has become 
a major method for evaluating popularity of nodes in infor- 
mation networks. Besides its primary application in search 
engines, PageRank is successfully used for solving other im- 
portant problems such as spam detection [20], graph parti- 
tioning [5], and finding gems in scientific citations [15], just 
to name a few. The PageRank [TJ] is defined as a station- 
ary distribution of a random walk on a set of Web pages. 
At each step, with probability c, the random walk follows a 
randomly chosen outgoing link, and with probability 1 — c, 
the walk starts afresh from a page chosen at random accord- 
ing to some distribution /. Such random jump also occurs 
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if a page is dangling, i.e. it does not have outgoing links. 
In the original definition, the teleportation distribution / 
is uniform over all Web pages. Then the PageRank values 
satisfy the equation 

PR[i) = cJ2 t PR ^ + 1 E PR(j) + — > » = !,.-•,", 

(hi Th Th 

(1) 

where PR(i) is the PageRank of page i, dj is the number 
of outgoing links of page j, the sum is taken over all pages 
j that link to page i, D is a set of dangling nodes, n is the 
number of pages in the Web, and c is the damping factor, 
which is a constant between and 1. 

From equation (fTJ) it is clear that the PageRank of a page de- 
pends on popularity and the number of pages that link to it. 
Thus, it can be expected that the distribution of PageRank 
should be related to the distribution of in-degree, the num- 
ber of incoming links. Most of experimental studies of the 
Web agree that in-degree follows a power law with expo- 
nent a — 1.1 for cumulative plot, which corresponds to the 
famous value 2.1 for the density. Pandurangan et al. [27] 
discovered that PageRank also follows a power law with 
the same exponent. Further experiments [5] 1161 I18| con- 
firmed this phenomenon. Mathematical justifications have 
been proposed in [6j[T9| for the preferential attachment mod- 
els [3J, and in [21], where the relation between PageRank and 
in-degree is modeled through a stochastic equation. 

At this point, it is important to realize that PageRank is 
a global characteristic of the Web, which depends on in- 
degrees, out-degrees, correlations, and other characteristics 
of the underlying graph. In contrast to in-degrees, whose 
impact on the PageRank log-log plot is thoroughly explored 
and relatively well understood, the influence of out-degrees 
and dangling nodes has hardly received any attention in the 
literature. It is however a common belief that dangling 
nodes are important [17] whereas out-degrees (almost) do 
not affect the PageRank [15]. We also note that in the lit- 
erature, there is no common agreement on the out-degree 
distribution. On the Web data, Broder et al. [13] report a 
power law with exponent about 2.6 for the density, whereas 
e.g. Donato et al. [16] obtain a distribution, which is clearly 
not a power law. On the other hand, for Wikipedia [14) . out- 
degree seems to follow a power law with the same exponent 
as in-degree. 

In the present paper we investigate the relations between 
PageRank and in/out-degrees, both analytically and exper- 



imentally. Our analytical model is an extension of |24| . We 
view the PageRank of a random page as a random variable R 
that depends on other factors through a stochastic equation 
resembling {T}. 

It is clear that the PageRank values in (TJJ| scale as 1/n with 
the number of pages. In the analysis, it is more convenient 
to deal with corresponding scale-free PageRank scores 

R(i) = nPR(i), i = l,...,n, (2) 

assuming that n goes to infinity. In this setting, it is easier 
to compare the probabilistic properties of PageRank and 
in/out-degrees, which are also scale-free. In the remainder 
of the paper, by PageRank we mean the scale-free PageRank 
scores ([2]). Then the original definition (TJJ| can be written 
as 

*W = C E j- i? 0') + ^E i ^) + 1 - c ' i = h---,n. (3) 

We are concerned with the tail probability ¥(R > a;), i.e. 
the fraction of pages with PageRank greater than x, when x 
is large. Our goal is to determine the asymptotic behavior 
of ¥(R > x), that is, we want to find a known function r(x) 
such that ¥(R > x)/r(x) — * 1 as x — + oo. In this case, we 
say that P(R > x) and r(x) are asymptotically equivalent, 
which essentially means that for large enough x, P(i? > x) 
and r(x) are close, and their log-log plots look the same. 
We formally describe power laws in terms of regular varying 
random variables, and we use recent results on regular vari- 
ation to obtain the PageRank asymptotics. To this end, we 
provide a recurrent stochastic model for the power iteration 
algorithm commonly used in PageRank computations [23] . 
and we obtain the PageRank asymptotics after each itera- 
tion. 

The analytical results suggest that the PageRank and in- 
degree follow power laws with the same exponent. The out- 
degrees and dangling nodes affect only a multiple factor, 
for which we find an exact expression. It follows that the 
out-degree sequence has a truly minor influence whereas the 
fraction of dangling nodes has a slightly greater impact on 
the multiple coefficient. The experiments on the Indochina- 
2004 Web sample [TJ, on the EU-2005 Web sample [JJ, and 
on the Stanford Web 2 , show that our model correctly pre- 
dicts the evolution of the PageRank distribution through 
the series of power iterations, and it adequately captures 
the influence of the network parameters. 

2. PRELIMINARIES 

We start with preliminaries on the theory of regular varia- 
tion, which is a natural formalization of power laws. More 
comprehensive details could be found, for instance, in [11] , 
We also refer to Jessen and Mikosch [22] for an excellent 
recent review. 

Definition 1. A function L(x) is slowly varying if for every 
t > 0, 

L{tx) 

r — » 1 as x — > oo. 
L(x) 

Definition 2. A non-negative random variable X is said 



to be regularly varying with index a if 

P(X > x) ~ x~ a L(x) as x -> oo, (4) 
for some positive slowly varying function L(x). 

Here, as in the remainder of this paper, the notation a(x) ~ 
b(x) means that a(x)/b(x) —> 1. 

The asymptotic equivalence Q is a formalization of a power 
law. In words, it means that for large enough x, the tail 
distribution P(X > x) can be approximated by the regularly 
varying function x~ a L(x), which is, in turn, approximately 
proportional to x~ a due to the definition of L. 

Regularly varying random variables represent a subclass of 
a much broader class of long-tailed random variables. 

Definition 3. A random variable X is long-tailed if for any 

y > o, 

F(X > x + y) ~ ¥(X > x) as x^oo. (5) 

Next lemma describes the behavior of a product and random 
sums of regular varying random variables. The relation (i) 
is known as Breiman's theorem (see e.g. Lemma 4.2.(1) 
in 22 ). Properties (ii) and (iii) are, respectively, statements 
(2) and (5) of Lemma 3.7 in [22]. 

Lemma 1. (i) Assume that X\ and X2 are two indepen- 
dent non-negative random variables such that X\ is 
regularly varying with index a and that E(A^* +£ ) < 00 
for some e > 0. Then 

PpfiX 2 > x) ~ E(X 2 Q )P(Ai > x). 

(ii) Assume that N is regularly varying with index a > 0; 

if a = 1, then assume that E(iV) < 00. Moreover, 
let (Xi) be i.i.d. sequence such that E(Ai) < 00 and 
P(Ai > x) = o(V(N > x)). Then as x -> 00, 

P >x\ ~ (E(Xx)) Q P(Af > x). 

(iii) Assume that P(jV > x) ~ rP(Ai > x) for some r > 0, 
that Xi is regularly varying with index a > 1, and 
E(Xl) < 00. Then 

P (jl X * >v)~ (E(A^) +r(E(Ai)) a )P(X! > x). 

3. THE MODEL 
3.1 In-degree 

It is a common knowledge that in-degrees in the Web graph 
obey a power law with exponent about 2.1 for the density, 
which corresponds to 1.1 for cumulative plot. The power 
law exponent may deviate somewhat depending on a data 
set [S] and an estimator [2H]- As in our previous work [24] . 
we model the in-degree as an integer regularly varying ran- 
dom variable. To this end, we assume that the in-degree of 
a random page is distributed as N(T), where T is regularly 



varying with index a and N(t) is the number of Poisson ar- 
rivals on the time interval [0, t], when arrival rate is 1. If T 
is regularly varying then N(T) is also regularly varying and 
asymptotically identical to T (see e.g. [24]). Thus, N(T) is 
indeed integer and obeys the power law. To simplify the no- 
tation, we will use N instead of N(T) throughout the paper. 
The proposed formalization for the in-degree distribution al- 
lows us to model the number of terms in the summation in 

3.2 Out-degree and inspection paradox 

Now, we want to model the weights 1/dj in ((SJ). Recall that 
dj is the out-degree of page j that has a link to page i. In 
[24] we studied the relation between in-degree and PageRank 
assuming that out-degrees of all pages are constant, equal 
to the expected in-degree d. In this work, we make a step 
further allowing for random out-degrees. 

We model out-degrees of pages linking to a randomly cho- 
sen page as independent and identically distributed random 
variables with arbitrary distribution. Thus, consider a ran- 
dom variable D, which represents the out-degree of a page 
that links to a particular randomly chosen page i. Note that 
D is not the same random variable as an out-degree of a ran- 
dom page since the additional information that a page has 
a link to i, alters the out-degree distribution. This famous 
phenomenon, called inspection paradox, finds its mathemati- 
cal explanations in Renewal Theory. The inspection paradox 
roughly states that an interval containing a random point 
tends to be larger than a randomly chosen interval |28] . For 
instance, in [29] . a number of children in a family, to which a 
randomly chosen child belongs, is stochastically larger than 
a number of children in a randomly chosen family. Likewise, 
a number of out-links D from a page containing a random 
link, should be stochastically larger than an out-degree of 
a random page. We will refer to D as effective out-degree. 
The term is motivated by the fact that the distribution of 
D is the one that participates in the PageRank formula. 

Now, let pj be a fraction of pages with out-degree j > 0. 
Then we have 



lim V(D 



JPj 
d : 



j > 1. 



(6) 



where d is the average in/out-degree, and n is the number 
of pages in the Web. For sufficiently large networks, we may 
assume that the distribution of D equals to its limiting dis- 
tribution defined by JB). Note that, naturally, the probabil- 
ity that a random link comes from a page with out-degree j 
is proportional to j. This was implicitly observed by Fortu- 
nato et al. in [18], who in fact used (J©J in their computations 
for the mean-filed approximation of PageRank. 

3.3 Stochastic equation 

We view the scale-free PageRank of a random page as a 
random variable R with E(i?) = 1. Further, we assume that 
the PageRank of a random page does not depend on the 
fact whether the page is dangling. Indeed, it can be shown 
that the PageRank of a page can not be altered significantly 
by modifying outgoing links [7]. Moreover, experiments e.g. 
in [17] show that dangling nodes are often just regular pages 
whose links have not been crawled, for instance, because it 
was not allowed by robot.txt. Besides, even authentically 



dangling pages such as .pdf or .ps files, often contain im- 
portant information and gain a high ranking independently 
of the fact that they do not have outgoing links. We note 
that such independence implies that the average PageRank 
of dangling nodes is 1, and thus the fraction of the total 
PageRank mass concentrated in dangling nodes, equals to 
the fraction of dangling nodes po: 



Po 



r> * 



Our goal is to model and analyze to which extent the tail 
probability ¥(R > x) for large enough x depends on the 
in-degree N, the effective out-degree D, and the fraction 
of dangling nodes po. To this end, we model PageRank R 
as a solution of a stochastic equation involving N and D. 
Inspired by the original formula ([3]l , the stochastic equation 
for the scale-free PageRank is as follows: 



d d 
R — c 



N 1 



3=1 



■i? i + [l-c(l-p )]. 



(J) 



Here N, Rj's and Dj's are independent; Rj's are distributed 

as R, Dj's are distributed as D, and a = b means that a 
and b have the same probability distribution. As before, 
c € (0, 1) is a damping factor. 

We note that the independence assumption for PageRanks 
and effective out-degrees of pages linking to the same page, 
is obviously not true in general. However, there is also no di- 
rect relation between these values as there is no experimental 
evidence that such dependencies would crucially influence 
the PageRank distribution. Thus, we assume independence 
in this study. 

The stochastic equation (0) is a generalization of the equa- 
tion analyzed in [24], where it was assumed that Dj's are 
constant. In order to demonstrate applicability of our model, 
we will use (0 to derive a mean-field approximation for the 
PageRank of a page with given in-degree. It follows from (J6| 
that 



-'id 

=i J 



Then, assuming that K(Rj) = 1, j = 1, 2, . . ., we obtain 

E(R\N) = C(1 ~ Po) N + [1 - c(l - po)]. (8) 

If po = then this coincides with the mean-field approxima- 
tion by Fortunato et al. in [18], obtained directly from the 
PageRank definition under minimal independence assump- 
tions and without considering dangling nodes. 

Equation belongs to the class of stochastic recursive 
equations that were discussed in detail in the recent survey 
by Aldous and Bandyopadhyay [4,. In particular, © has an 
apparent similarity with distributional equations motivated 
by branching processes and branching random walks. Such 
equations were studied in detail by Liu in [25] and his other 
papers. Taking expectations in ([8]), we see that if E(iij) = 1, 
j = l,2,.. ., then E(_R) also equals 1. In Section [5] we will 
show that (0 has a unique solution R such that E(_R) = 1. 




Figure Is An example of Galton- Watson tree 

4. MODEL FOR POWER ITERATIONS 

In this section, we will introduce an iteration procedure 
for solving (0. This procedure can be seen as a stochas- 
tic model for the power iteration method commonly used 
in PageRank computations. We first present the notations, 
which are in lines with Liu 1251. 



Let 



{ (jfuf TJ~ ' TT~ ' • • •) j ^ e a family °f independent 
copies of ^N, -jjj-, . . .^j indexed by all finite sequences 

u — u\ . . . Un, Ui £ {1, 2, . . .}. And let T be the Galton- 
Watson tree with defining elements {N u } : we have G T 
and, if u £ T and i € {1, 2, . . .}, then concatenation ui € T if 
and only if 1 < i < N u - In other words, we indexed the nodes 
of the tree with root and the first level nodes 1, 2, ..Nq, and 
at every subsequent level, the ith offspring of u is named ui 
(see Figure 1). 

Now, we will iterate the equation ((7}. We start with initial 

distribution R m , E (r(°A = 1, and for every k > 1, we 

define the result of the fcth iteration through a distributional 
identity 



R 



(fe) d 



N 1 

^ J_^-i) + [ l_ c( l_ po)]j 



3=1 



(9) 



where N, R-f x) and Dj,j> 1, are independent. We argue 
that if R {0) = 1 then R {k) serves as a stochastic model for 
the result of the fcth power iteration in standard PageRank 
computations. Indeed, according to @ for R^- 1 ' we can ob- 
tain 



RU 



N 1 

^-i- + [l- c (l-p )], 



which clearly corresponds to the first power iteration with 
initial uniform vector: 

Pi? (1) (i) = c ^i- + [i- c (l- Po )] ; i=l... n . 



This argument can be easily extended to further iterations. 

Since PageRank vector is always a result of a finite number 
of iterations, it follows that R^ describes the distribution 
of PageRank if the power iteration algorithm stops after k 



steps. Assuming that in-degrees, effective out-degrees and 
R ( u \ u e T, are independent, and repeatedly applying Q, 
we derive the following representation for R^ k >: 



R 



(fc) 



u=u\ . .Ufa £T 



l 



-R 



(0) 



where 



Y 



(n) 



+ [l-c(l-po)]^c n F (n) , k>l, 

n=0 



(10) 



u — u\ . ..u n £T 



n > 1. 



The random variable Y( n ' represents the sum of the weights 
of the nth level of the Galton- Watson tree, where the root 
has weight 1, each edge has a random weight distributed as 
1/D, and the weight of a node is a product of weights of the 
edges, which are on the way from the root to this node. 

In the subsequent analysis we will prove that iterations R^ , 
k > 1, converge to a unique solution of @, and we will 
obtain the tail behavior of for each k > 1. This will 
give us the asymptotic behavior of the PageRank vector after 
an arbitrary number of power iterations. 

5. ANALYTICAL RESULTS 

First, we establish that our main stochastic equation (0 
indeed defines a unique distribution R, that can serve as a 
model for the PageRank of a random page. The result is 
formally stated in the next theorem (the proof is given in 
Section [HJ. 



Theorem 1. Equation has a unique non-trivial solu- 
tion with mean 1 given by 



R 



(oo) 



lim R (k) = [1 - c(l - po)]^T c"Y (n) . (11) 



Now we are ready to describe the tail behavior of i?' fc ' , k > 1, 
which models the PageRank after k power iterations. The 
main result is presented in Theorem [2] below. 



Theorem 2. // P (i? (0) > = o(P(iV > x)), then fo 
all k > 1, 

F(R (k) > x) ~ C fe P(iV > x) as x -» oo, 
where C k = ( c ' 1 7 o) )°E J t 1 c' a ^ l and b = dE (1/ D a ) = 

Eoo Pj 



The form of the coefficient Ct arises from the proof, which 
relies on the results from [22]. The proof is provided in 
Section [8] For large enough k, Ck can be approximated by 

C = lunC k = l . 
fc-.oo d a (l-c a b) 



From the Jensen's inequality E(l/D° 
IpO]) , it follows that 6 > (1 - p ) a d 1 - c 

c a (l-po) a 



> (E(l/D)) a and 
and hence, 



C > 



d a (l - c«(l -p ) a d 1 - a )' 



(12) 



The last expression is the value of C if out-degree of all non- 
dangling nodes is a constant. Note that if a » 1.1, then the 
difference between the left- and the right-hand sides of (|12|l 
is really small for any reasonable out-degree distribution. 

From Theorem [2] we can make interesting conclusions about 
the relation between PageRank and in/out-degrees. As it is 
commonly known from experiments, the power law exponent 
of the PageRank is the same as the power law exponent of 
in-degree. Clearly, this exponent is not affected by out- 
degrees. Thus, in-degree remains a major factor shaping 
the PageRank distribution. The multiple factor Cu, k > 1, 
depends mainly on the mean in-degree d, damping factor 
c, and the fraction of non-dangling nodes (1 — po). The 
values pj, j > 1, that specify the out-degree distribution, 
have some effect on the coefficient b but this results in a 
truly minor impact on the PageRank asymptotics. Hence, 
our results confirm the common idea that the out-degree 
distribution has a very little influence on the PageRank, 
but here we could also explicitly quantify this minor effect. 
In the next section we will compare out analytical findings 
with experimental results. 

6. EXPERIMENTS 

We performed experiments on Indochina- 2004 and EU-2005 
Web samples collected by The Laboratory for Web Algo- 
rithmics (LAW), Dipartimento di Scienze dell'Informazione 
(DSI) of the Universit degli studi di Milano pp. We also 
used a Stanford-2002 Web sample [2] . In Figures 2-4 below 
we present cumulative log-log plots for in-degree/PageRank. 
The y-axis corresponds to the fraction of pages with in- 
degree/PageRank greater than the value on the x-axis. For 
in-degree, the power law exponent in evaluated using the 
maximum likelihood estimator from [2B], and the straight 
line is fitted accordingly. For the PageRank, we plot the 
theoretically predicted straight lines obtained from Theo- 
rem H 

The Indochina set contains 7414866 nodes and 194109311 
links. The results are presented in Figure 2 below. The in- 
degree plot resembles a power law except for the excessively 
large fraction of pages with in-degree about 10 4 . We sus- 
pect that this irregularity might be related to the specific 
crawling technique [TO]. For more detail on this data set 
see [8]. For Indochina, we obtain a power law exponent 1.17 
for cumulative plot, which is quite different from the result 
in |S]. This demonstrates the sensitivity of estimators for 
the power law exponent. Indeed, the exponent 0.6 in [8| ref- 
lects the behavior in the first part of the plot, whereas 1.17 
gives more weight on the tail of the in-degree distribution. 

We fit the straight line y — —1.1 7x + 0.80 into the in-degree 
plot and then compute the distance 



logio(C) = logi 



1-Po) 



d a (l - c a b) 



0.65, we obtain the following prediction for the PageRank 
log-log plot: y = -1.17x - 1.73 for c = 0.2, y = -1.17a: - 
1.16 for c = 0.5, and y = -1.17a; - 0.70 for c = 0.85. In 
Figure [6] we show these theoretically predicted lines and the 
experimental PageRank log-log plots. We see that for this 
data set, our model provides the linear fit with a striking 
accuracy. 



Figure 2: Indochina data set: cumulative log-log 
plots for in-degree/PageRank. The straight lines for 
the PageRank plots are predicted by the model. 




We performed the same experiment for EU-2005 of 862664 
nodes and 19235140 links. In this data set in-degree shows 
a typical power law behavior, which is fitted perfectly by 
y = — l.lac+0.61. We use the same approach to calculate the 
difference between the in-degree and PageRank plots for d = 
22.3, po = 0.08, b = 0.70. Thus, the theoretical prediction for 
the PageRank are y = —1.1a; — 1.63, y = —1.1a; — 1.07, and 
y = -l.li-0.60 for c = 0.2, 0.5, and 0.85, respectively. The 
log-log plots for experimental data, the fitted straight line 
for in-degree, and corresponding theoretical straight lines 
for PageRank, are presented in Figure 3. 



Figure 3: EU-2005 data set: cumulative log-log plots 
for in-degree/PageRank. The straight lines for the 
PageRank plots are predicted by the model. 



in-degree 
PageRank (c=0.2) 
PageRank (c=0.5) 
PageRank (c=0.85) 
-1.1xt0.61 
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-1.1x-1.07 
-1.1x-0.60 



between the in-degree and the PageRank log-log plots for 
c = 0.2, 0.5, and 0.85. With d = 26.17, p = 0.18, and b = 




in-degree, PageRank 



Finally, we verify out model for power iterations. For that, 
we use a smaller Web sample from 2] that contains 281903 
pages and above 2.3 million links. In Figure 4 we show the 
cumulative log-log plot of in-degree, and the log-log plots 
of the PageRank after the 1st, the 2nd, and the last power 
iterations for the damping factor 0.85. To predict the dif- 
ference between in-degree and PageRank's iterations we use 
the result of Theorem |2] for d = 8.2032, p = 0.006, and 
b — 0.8558. Thus, if in-degree distribution could be fitted by 
y = -1.1a + 0.08, then y = -1.1a; - 1.00, y = -l.lx - 0.77, 
and y = —l.lx — 0.46. are the predicted PageRank af- 
ter the 1st, the 2nd, and the last power iterations, respec- 
tively. Although the obtained lines do not match perfectly 
the PageRank distribution, we see that our model correctly 
captures the dynamics of the PageRank distribution in suc- 
cessive power iterations. The difference between the theore- 
tical prediction and the real data might occur because of the 
specific structure of this data set. For instance, the number 
of dangling nodes in this Web sample is negligibly small, 
which is not true for the real Web. 

Figure 4: Stanford data set: cumulative log- log plots 
for in-degree/PageRank. The straight lines for the 
PageRank plots are predicted by the model for the 
1st, the 2nd, and the last power iterations. 




in-degree, PageRank 



7. DISCUSSION 

In this paper, we proposed an analytical stochastic model 
that helps to predict the shape of the PageRank log-log plot 
on basis of in-degree distribution, the damping factor, and 
the fraction of dangling nodes. It also follows form the model 
that the out-degree distribution has a truly minor impact on 
the PageRank. To make our mathematical model analyti- 
cally tractable, we had to allow for several simplifying as- 
sumptions, such as independence of certain parameters and 
uniform teleportation. Experiments show that our theoreti- 
cal model matches the Web data with a good accuracy. 

One can argue that a uniform teleportation vector / does not 
suit anymore for Web ranking [17] . Indeed, there are smarter 
choices of / that take into account user's preferences, favor 
certain topics related to a query [21], or give higher weights 
to trusted pages for eliminating the spam [17]. The goal of 
this paper however was not improving the Web ranking but 



rather analyzing why the PageRank vector has certain prop- 
erties reflected in its log-log plot. In order to capture the 
influence of in- and out-degrees, we had to make simplifying 
assumptions on other factors. However, we believe that our 
approach is promising in modeling relations between differ- 
ent parameters in the Web. In further research, we plan to 
gradually improve our model including dependencies, per- 
sonalization, and other important factors relevant for the 
contemporary Web search. 

8. PROOFS 

Proof of Theorem [TJ First, we establish that is 
well-defined random variable. We consider some initial dis- 
tribution R (0) with E(R {0) ) = 1. Then the first part of JlO]) 
has a mean c fe (l — Po) k , and hence it converges in proba- 
bility to because, by the Markov inequality, the proba- 
bility that this term is greater than some e > is at most 
c k (l-p ) k /e -> as k -> oo. Further, since (1 - p )- n Y {n) 
is a martingale with mean 1, and lim n _, 00 (l — po)~ n Y ( - n ^ 
exists and it is finite (see [25]), the second part of (|10|l con- 
verges a.s. to i?*- 00 - 1 as k — > oo. It follows that (|10[) converges 
to Rt 00 ^ in probability and according to the monotone con- 
vergence theorem 

k 

E (7? (oo) ) = [1 - c(l - p )] Urn C " E ( y(n) ) = L 

It is easy to verify that i?' 00 ' in (|11[) is a solution of ©. 
To prove the uniqueness, we assume that there is another 
solution with mean 1, then we take this solution as an ini- 
tial distribution and repeat the argumentation above. 
Thus, we can conclude that there is no other fixed point of 
with mean 1 except i? (oo) . □ 

Proof of Theorem [2] We will use the induction. For 
k = 1, we derive 

p > xj ~ p (j2 -^ r t + 1 1 - < i - p°)] > x j 

~ ( C(1 d P0) ) a nN>x~[l~ c(l - po)]) 

~ CiP(jV > x) as x -> oo, 

where the second relation follows from Lemma QJn) be- 
cause E(JV) = d < oo, E (R ( °A = 1, E (cD^R^ = 

c(l - p )d~ 1 < oo, and P (cD^ 1 R^ > x^j = o(¥(N > a;)), 
and the last relation follows from {5). 

Now, assume that the result has been shown for (k — l)th 
iteration, k > 2. Then Lemma QJi) yields 

P ( £. R U-» > x ) „ C " E (J-^ C*_!P(JV > x) 

= ^-6C fe _ 1 P(7V>x), 
a 

where 

E (±.)=ypi = ly^i- = l b . 

\D a I ^ j a d^ j"- 1 d 



Then, since E (cD -1 ^ -1 ^ = c(l - po)d~ 1 < oo and 
E(iV) = d, we apply Lemma [TJiii) to obtain 

¥(R (k) > x) ~ P ^ j^R (k ~ 1} + [1 - c(l -po)] > 

c^fcCfc-! + ( C(1 ~ P ° ) N ) WiV > z - [1 - c(l - po)]) 



r' «'/ , + ( C(1 d P0) ) J ii.V .,•) as.r - x. 



for any k > 2. Here the last relation again follows from the 
property of long-tailed random variables ([5]). 

Then for the constant Ck we have 
'c(l -po) V 



Ck = C a b Ck-! + 



d 



= c a b 



c(l -po) 



d 



3=0 



c(l - Po) 
d 



c(l - po) V 



j=0 



□ 
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